Skip to content

Conversation

Winter-Soren
Copy link
Contributor

What was wrong?

The execution of the circuit-relay example hangs indefinitely when attempting to create a new stream on a successfully established relay connection. The code connects to the relay node and establishes a connection to the destination through the relay, but then freezes when trying to open a stream on that connection without any error messages or exceptions being thrown.

Issue #694

How was it fixed?

Loggers were added to aid debugging. The indefinite hanging issue has been resolved, but now the execution halts when the CONNECT message is sent.

Summary of approach.

To-Do

  • Clean up commit history
  • Add or update documentation related to these changes
  • Add entry to the release notes

Cute Animal Picture

put a cute animal picture link inside the parentheses

@seetadev
Copy link
Contributor

@Winter-Soren : Hi Soham. Thank you for submitting the PR. Appreciate it.

Wish if you could fix the CI/CD issues. Circuit relay is indeed urgently needed for universal connectivity dapp for py-libp2p and also for the workshop branch.

@seetadev
Copy link
Contributor

seetadev commented Jul 19, 2025

@guha-rahul , @sukhman-sukh, @lla-dane : Wish if you could test circuit relay on your devices in parallel.

@seetadev
Copy link
Contributor

@Winter-Soren : Great, thanks Soham. Please add test cases covering specific NAT traversal scenarios and add a newsfragment. Will do a final review + merge after we do a collective testing. Appreciate your efforts.

@Winter-Soren
Copy link
Contributor Author

Hi @lla-dane and @guha-rahul,

I’d love to get your review on the example.
Regarding the bug, I’ve removed the indefinite hanging in the dial_peer_info method. When the source peer sends the CONNECT message, the relay node still fails to accept it. I tried adding additional protocol handlers, but the issue persists.

Would appreciate it if you could take a fresh look and suggest any improvements.

@sukhman-sukh
Copy link
Contributor

Hey @Winter-Soren, luca made some more fixes on yamux. Can you rebase it and test again. Maybe that could fir the issue

@lla-dane
Copy link
Contributor

lla-dane commented Jul 22, 2025

@guha-rahul , @sukhman-sukh, @lla-dane : Wish if you could test circuit relay on your devices in parallel.

At my end, it is failing when the source peer is attempting to dial destination:
Source peer logs:

2025-07-22 11:37:27,592 | circuit-relay-example | INFO | Attempting to dial destination 16Uiu2HAkvHVvWuyrMgYYq6uDirGSrCvM1HfhmrD1ajjZZ8fkArGh through relay 16Uiu2HAmRJmmU71BgGafq2Ee9ZH1y4v6zH8Es5VAmvfWNBmcXWDV
Error making reservation: 
Failed to make reservation with relay 16Uiu2HAmRJmmU71BgGafq2Ee9ZH1y4v6zH8Es5VAmvfWNBmcXWDV
2025-07-22 11:37:57,627 | circuit-relay-example | ERROR | Failed to dial through relay: Failed to establish relay connection: 
2025-07-22 11:37:57,627 | circuit-relay-example | ERROR | Exception type: ConnectionError
2025-07-22 11:37:57,627 | circuit-relay-example | ERROR | Error: Failed to establish relay connection: 

Source operation completed
image

@guha-rahul
Copy link
Contributor

@guha-rahul , @sukhman-sukh, @lla-dane : Wish if you could test circuit relay on your devices in parallel.

I am getting the same error as @lla-dane , but a quick question, should we be sending out raw protobuf messages since in other places I saw we are using variants.

@seetadev
Copy link
Contributor

seetadev commented Jul 22, 2025

I am getting the same error as @lla-dane , but a quick question, should we be sending out raw protobuf messages since in other places I saw we are using variants.

@guha-rahul : Thanks for raising this—and great question!

You're right to observe the difference here. In some parts of the codebase, especially when interfacing with protocols like Identify or PeerRecord, we use protobuf variants or length-prefixed wrappers to ensure compatibility with libp2p's framing expectations. However, in other cases—especially when we're doing low-level testing or working within a tightly scoped context—we send raw protobuf messages directly over the stream.

In the specific scenario you’re encountering (same as @lla-dane’s), it's expected to send raw protobuf messages unless the protocol explicitly requires a variant or length-prefix for parsing. If you're getting an error, it might be related to how the message is being framed or interpreted on the receiving end. Feel free to share a snippet or the stack trace—happy to ask @Winter-Soren, @sukhman-sukh and @lla-dane help debug further.

We’re working towards clearer abstractions for these different framing cases to avoid this confusion going forward. Appreciate your attention to detail.

@sumanjeet0012
Copy link
Contributor

@Winter-Soren I have tried running the code and found some issues:

  1. The Circuit relay example file only prints logs of that file.

logger = logging.getLogger("circuit-relay-example")

We are currently unable to access the actual debug logs of the circuit relays.
It is essential to have logs that provide real-time information, including:

  • Whether our node is connected to another host
  • Whether the connected host is a relay server
  • Whether our node is able to successfully make a reservation on the relay server
  1. Relay Discovery Issue:

In the setup destination function we use:

protocol = CircuitV2Protocol(host, limits=limits, allow_hop=False)

and after that we start it as a background service.

async with background_trio_service(protocol):

which immediately starts searching for relay servers:

nursery.start_soon(self.discover_relays)

But at that time there are no hosts connected as:

await host.connect(relay_info)
logger.info(f"Connected to relay {relay_info.peer_id}")

The host connection is done in the last line.

So, in summary, the typical sequence is as follows:
when setting up the destination node,
we first verify whether a host is available and determine if it is a relay server using nursery.start_soon(self.discover_relays) before proceeding to connecting to the host. So the first time it fails to discover relay server.
Subsequently, the function nursery.start_soon(self.discover_relays) is scheduled to run next time after the DEFAULT_DISCOVERY_INTERVAL (which is set to 60 seconds). However, it is possible that, before this discovery function executes, other functions may timeout or the testing process may conclude.

Proposed Solution:

  1. Lets use the root logger in circuit relay example file to see the complete picture.
  2. Connect to relay server before running nursery.start_soon(self.discover_relays) function.

This solution may not fully resolve all of the existing issues; however, it may represents a significant step forward toward addressing the problem.

what's your thoughts on this @Winter-Soren

@seetadev
Copy link
Contributor

@Winter-Soren : Thank you for making improvements. Appreciate it.

Please resolve the CI/CD issues.

@sukhman-sukh
Copy link
Contributor

Hey @Winter-Soren @seetadev, I have fixed reservation and hop/stop connection through relay in this PR Winter-Soren#1. I have created merge request to this ongoing PR. Please review and merge

@sukhman-sukh
Copy link
Contributor

sukhman-sukh commented Aug 30, 2025

Also, I have a few doubts, like:

  1. I have commented this line for now because I could not understand its need. Can you please help me with this : https://github.com/Winter-Soren/py-libp2p/pull/1/files#diff-6c1b8cf653e4a4a56ef93a339a502035ad1d0abc59fc6c018834c189541b98afR425-R444
  2. Right now, I am unable to send a message on top of relay. Connection is there but I am unable to create a stream on top of that. I am using stream = await connection.new_stream("/echo/1.0.0") from /docs/examples.circuit_relay.rst, but it is failing as the connection don't have the new_stream function.
  3. Also,
async def _read_stream_with_retry(
       self,
       stream: INetStream,
       max_retries: int = MAX_READ_RETRIES,
   ) -> bytes | None:

is failing if no message is immediately sent on the stream but I think it should wait till connection closes and not end after 2/3 tries which are meant to fail if other side is not sending anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants